Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pairwise #627

Merged
merged 22 commits into from
May 2, 2021
Merged

Add pairwise #627

merged 22 commits into from
May 2, 2021

Conversation

nalimilan
Copy link
Member

This generic method takes iterators of vectors and supports skipping missing values.
It is a more general version of pairwise in Distances.jl.
Since methods are compatible, both packages can override a common empty function defined in another API package.

See nalimilan/FreqTables.jl#54. Cc: @dkarrasch @bkamins @quinnj

This generic method takes iterators of vectors and supports skipping
missing values.
It is a more general version of `pairwise` in Distances.jl.
Since methods are compatible, both packages can override a common empty
function defined in another API package.
src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, only two comments about indexing/iterating x and y and one-based indexing into res.

If I haven't missed anything, then these methods here are more generic than the generic methods in Distances.jl. I believe one could map those to the _pairwise! "kernels" here.

EDIT: Aha, I guess one difference could be that you're assuming here that you apply pairwise to iterators of (indexable) vectors, whereas pairwise in Distances.jl applies to iterators of iterators.

src/pairwise.jl Outdated Show resolved Hide resolved
src/pairwise.jl Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member Author

I've pushed commits which should address all comments. I've also made two changes:

  • Add pairwise! since it is standard practice to have in-place variants
  • When skipmissing=:none, allow entries to be any objects, and not just vectors. This is more consistent with Distances.jl as @dkarrasch noted.

@bkamins
Copy link
Contributor

bkamins commented Jan 10, 2021

Nice. Just to go back for a second to the issue with missing. Let us confirm that we really want:

julia> x = [missing, missing, missing]
3-element Array{Missing,1}:
 missing
 missing
 missing

julia> pairwise(cor, Ref(x), Ref(x))
1×1 Array{Missing,2}:
 missing

as this is the first case when the docstring is not correct (but I think it is OK as you have no way to infer the type of "one").

The second case is:

julia> x = Union{Int, Missing}[missing, missing, missing]
3-element Array{Union{Missing, Int64},1}:
 missing
 missing
 missing

julia> pairwise(cor, Ref(x), Ref(x), skipmissing=:pairwise)
ERROR: ArgumentError: correlation only defined for non-empty vectors

(we error although we promise to return 1, but again - maybe it is OK to leave this corner case as is)

@nalimilan
Copy link
Member Author

Good catch. Yes I'm not sure we can do much about it. Technically we return one(missing), so the docstring isn't totally wrong in the first case. :-p I'm not sure it's worth mentioning as it sounds difficult to explain without making the sentence very complex, but at least I've added tests so that we don't change the behavior without noticing. This made me discover that in some cases we threw a MethodError because the destination was Matrix{Union{}} instead of letting cor throw the expected ArgumentError.

I've also found a simple way to use promote_type rather than promote_typejoin to choose the element type, while still having the function inferable. I think this makes more sense as it gives a concrete eltype (Float64 in general) if you have columns of different types (which is common for tables), and it justifies using pairwise rather than just map(f, Iterators.product(x, y)).

@bkamins
Copy link
Contributor

bkamins commented Jan 11, 2021

Looks good in general (but CI fails on Julia 1.0, so probably a separate code path for it is required).

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, looks good.

src/pairwise.jl Outdated Show resolved Hide resolved
@nalimilan
Copy link
Member Author

I've created a StatsAPI package, with a PR to define pairwisethere: JuliaStats/StatsAPI.jl#1

@pdeffebach
Copy link

Could we implement a special case for pairwise(cor, x::Vector{Number}, y::Vector{Number}) to allow scalar returns?

Or would that be a different function entirely.

@nalimilan
Copy link
Member Author

Could we implement a special case for pairwise(cor, x::Vector{Number}, y::Vector{Number}) to allow scalar returns?

Or would that be a different function entirely.

That would have to be another function (more akin to skipmissings), as this PR already gives a meaning to this signature. Vectors of numbers are treated like any other collection:

julia> pairwise(tuple, [1, 2], [3, 4])
2×2 Matrix{Tuple{Int64, Int64}}:
 (1, 3)  (1, 4)
 (2, 3)  (2, 4)

I've added a commit to import pairwise and pairwise! from StatsAPI. With JuliaStats/Distances.jl#213, StatsBase and Distances will cohabit nicely without method ambiguities. I'll merge in a few days barring objections.

@bkamins
Copy link
Contributor

bkamins commented Apr 5, 2021

OK. Looks good.

@wildart
Copy link
Contributor

wildart commented Apr 28, 2021

When is this expected to be merged? I'd like to use new APi in JuliaStats/MultivariateStats.jl#148.

@nalimilan nalimilan merged commit d18762c into master May 2, 2021
@nalimilan nalimilan deleted the nl/pairwise branch May 2, 2021 14:38
@nalimilan
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants